chore: Plumb query results to graph server/colab callback via python globals, rather than html/javascript#205
chore: Plumb query results to graph server/colab callback via python globals, rather than html/javascript#205ericfe-google wants to merge 5 commits intomainfrom
Conversation
…avascript. WARNING: Works in Jupyter only, not colab.
Summary of ChangesHello @ericfe-google, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant architectural change in how query results are handled within the BigQuery magics and graph visualization components. By transitioning from passing query results through HTML/JavaScript parameters to utilizing Python global variables, the system can now efficiently manage and visualize much larger datasets. This refactoring directly addresses and mitigates potential out-of-memory errors that previously occurred with extensive query outputs, thereby enhancing the stability and scalability of the graph visualizer. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the graph visualization feature to pass query results via a global Python variable instead of embedding them in HTML. This is a good change to support larger query results and avoid browser memory issues.
However, the current implementation introduces a race condition by using a single global variable to store the result. If multiple cells generating graphs are run concurrently, this could lead to incorrect data being displayed. I've suggested a more robust approach using unique IDs for each query to make the process stateless.
I've also pointed out a couple of minor issues with unused variables and dead code that could be cleaned up.
| singleton_server_thread = graph_server.graph_server.init() | ||
| port = graph_server.graph_server.port | ||
|
|
||
| graph_server.graph_server.query_result = query_result |
There was a problem hiding this comment.
Using a single global variable graph_server.graph_server.query_result to pass data to the graph visualizer introduces a potential race condition. If two notebook cells that generate graphs are run in quick succession, the second cell could overwrite query_result before the first cell's visualization has had a chance to fetch its data, leading to the first visualization displaying the second one's data.
A more robust solution would be to avoid this shared state. For example, you could store query results in a dictionary keyed by a unique ID:
- Generate a unique ID for each query.
- Store the query result in a dictionary on the
graph_serverobject, e.g.,graph_server.graph_server.results[query_id] = query_result. - Pass this
query_idto the frontend via theparamsargument ofgenerate_visualization_html. - The frontend callback would then include this
query_idwhen calling back to Python, allowing_colab_query_callbackandhandle_post_queryto retrieve the correct result.
This would make the visualization mechanism stateless and safe for concurrent executions.
There was a problem hiding this comment.
Not a bad suggestion, but I have not encountered a notebook environment that allows more than one cell to run at a time, so it's not a huge concern for me.
| @@ -633,8 +633,9 @@ def _handle_result(result, args): | |||
|
|
|||
|
|
|||
| def _colab_query_callback(query: str, params: str): | |||
There was a problem hiding this comment.
bigquery_magics/graph_server.py
Outdated
| @@ -251,7 +252,9 @@ def handle_post_ping(self): | |||
|
|
|||
| def handle_post_query(self): | |||
| data = self.parse_post_data() | |||
chore: Plumb query results to graph server/colab callback via python globals, rather than html/javascript
This improves allows the graph visualizer to handle larger query results without running out of memory, due to the generated html being too large.